I install all packages and load all required libraried. This notebook is prepared for the text mining.
library(tidyverse)
library(tidytext)
library(DT)
library(scales)
library(wordcloud)
library(wordcloud2)
library(gridExtra)
library(ngram)
library(shiny)
library(dplyr)
library(ggplot2)
library(wordcloud2)
library(jpeg)
library(reshape)
library(plotly)
In the last 24 hours, lots of things can make us feel happy. For example, parents feel happy when their chindren get dinner with them. Young people feel happy when they play game with friends. We may think happy moments may be changed for people in performing different roles in different phases. However, is it true for most of people? Whether there is a certain kind of things can permanently make people happy? In here, I focus on two types of people, married and single. Let’s find it out by analyzing happy moments from these two clusters!
First of all, we need to clean all of our past stored datasets. And then, I import our datasets: processed_moment.csv and urlfile.
rm(list=ls())
hm_data <- read.csv("/Users/SunnyZhao/Documents/GitHub/Fall2018-Proj1-SunnyZhaoly/output/processed_moments.csv")
urlfile<-'https://raw.githubusercontent.com/rit-public/HappyDB/master/happydb/data/demographic.csv'
demo_data <- read.csv(urlfile)
After imported, I need to combine both the data sets and keep the required columns for analysis. For here, I only keep columns that I will use in the following parts, like “cleaned_hm”, “predicted_category” and “text”.
hm_data <- hm_data %>%
inner_join(demo_data, by = "wid") %>%
select(wid,
cleaned_hm,
gender,
marital,
reflection_period,
age,
predicted_category,
text) %>%
filter(gender %in% c("m", "f")) %>%
filter(marital %in% c("single", "married")) %>%
filter(reflection_period %in% "24h") %>%
mutate(reflection_period = fct_recode(reflection_period, hours_24 = "24h"))
#hm_data$text
hm_data$text<-as.character(hm_data$text)
Here I show you the specific text about happy moments. And we will find interesting things from these happy moments.
Notice there are sentences in text. So I need to separate each text into words and use the bag (“bag_of_words”) to store words, which is convenience for me to make text mining.
bag_of_words <- hm_data %>%
unnest_tokens(word, text)
word_count <- bag_of_words %>%
count(word, sort = TRUE)
I want to check what’s the difference of happy moments between married people and single people. So I need to classify oringal data into two clusters-married and single. Then I check the frequency of each words appared in each cluster.
married_people<-hm_data%>%
filter(marital %in% "married")
bag_of_words_married <- married_people %>%
unnest_tokens(word, text)
word_count_married <- bag_of_words_married %>%
count(word, sort = TRUE)
single_people<-hm_data%>%
filter(marital %in% "single")
bag_of_words_single <- single_people %>%
unnest_tokens(word, text)
word_count_single <- bag_of_words_single %>%
count(word, sort = TRUE)
Here, I use word cloud to represent married and single people.
For married status:
For single status:
Here, we can find that the top 5 words appeared on single people’s happy moments are different with married people. For single status, the top 5 words are: friend, time, day, watched and played. For married status, the top 5 words are: time, day, friend, son and husband. This can show us: when we’re single, we have more freedom and time to do things what we want to do. After marriage, we focus on family and kids since we have difference responsibility and play a different role compared before. We also think about, from single to married, does the happy moment in last 24h for male change a lot or for female change a lot?
In this part, I want to know whether marriage can bring more happy moments to male or female. So here, I check the top 10 high frequency words came from their happy moments and list them in the following table:
## Single male Single female Married male Married female
## 1 video game spend time spend time spend time
## 2 watched movie ice cream video game moment life
## 3 ice cream moment life watched movie watched movie
## 4 played video summer vocation moment life husband home
## 5 spend time nice day dinner wife ice cream
## 6 moment life ready office ice cream read book
## 7 met friend mother tickle moment feel dinner night
## 8 played game watched movie define happiness husband dinner
## 9 talked friend future life makes feel cooked dinner
## 10 event hours tickle laugh played video dinner family
Here we notice that “video game” and “watched movie” are always important for male, no matter single or married. However, there appears “dinner wife”. Acturally, male is also attach importance to their family. For female, the high frequency of words changes a lot. We find some new words, “husband home”, “dinner night”, “husband dinner” and “dinner family”. It shows us female really pay more attention on their family as we think. Also, here is one interesting thing: “ice cream” can always create happy moments for all of people.
In this part, I want to know that family can bring more happy moments to male or to female. I use “ggplot2” and “plotly” package to make data visualization. So here, I check the top 6 high frequency words BOTH came from their happy moments to find interesting things. I compare two status: single male vs. married male and single female vs. married female.
From above, we can see that marital status does not change male too much. We notice two things: firstly, the frequency of “watched movie” is decreased from 126 to 69 but it is increased for female. Secondly, more happy moments come from dinner and family. For single female status, the words like “family” and “dinner” are not coming out often. But they’re become high requency words in their happy moments after marriage. It’s clearly show us: people will take different responsibility and different role in different period. Finally, ice cream is really powerful!
But here, we also notice one intersting thing: the frequency of “dinner family” is decreased after male married but for female is clearly increased. Why? Maybe next time when you take dinner with your parents, you can ask your father whether he feel happiness or not. (Be care, do not let your mother hear the question.)
By analyzing the happy moments from people with different marital status, we should get the following results:
● When we are single, we think “friend”, “game”, “home” and “dog” can create more happy moments for us;
● When we get married, family, husband/wife or kids will create more happy moments for us;
● No matter married or single, “friend”, “ice cream” and “watch movie” are always important;
● Happy moments for male changed a little from single to married. The most happy moments come from “watched movie”, “video game”. Family also bring more happy moments for married male, even though the increment is not significant;
● Happy moments for female changed a lot from single to married. The most happy moments are changed from “watch movie”, “summer vocation” to high frequency of “husband” as well as “family”.